Identifying Different Meanings of a Chinese Morpheme through Latent Semantic Analysis and Minimum Spanning Tree Analysis
نویسندگان
چکیده
A character corresponds roughly to a morpheme in Chinese, and it usually takes on multiple meanings. In this paper, we aimed at capturing the multiple meanings of a Chinese morpheme across polymorphemic words in a growing semantic micro-space. Using Latent Semantic Analysis (LSA), we created several nested LSA semantic micro-spaces of increasing size. The term-document matrix of the smallest semantic space was obtained through filtering a whole corpus with a list of 192 Chinese polymorphemic words sharing a common morpheme ( gong1). For each of our created Chinese LSA space, we computed the whole cosine matrix of all the terms of the semantic space to measure semantic similarity between words. From the cosine matrix, we derived a dissimilarity matrix. This dissimilarity matrix was viewed as the adjacency matrix of a complete weighted undirected graph. We built from this graph a minimum spanning tree (MST). So, each of our LSA semantic space had its associated MST. It is shown that in our biggest MST, paths can be used to infer and capture the correct meaning of a morpheme embedded in a polymorphemic word. Clusters of the different meanings of a polysemous morpheme can be created from the minimum spanning tree. Finally, it is concluded that our approach could model partly human knowledge representation and acquisition of the different meanings of Chinese polysemous morphemes. Our work is thought to bring some insights to the Plato’s problem and additional evidence towards the plausibility of words serving as ungrounded symbols. Future directions are sketched. IJCLA VOL. 1, NO. 1-2, JAN-DEC 2010, PP. 153-168 RECEIVED 24/11/09 ACCEPTED 16/01/10 FINAL 11/03/10
منابع مشابه
Identifying Different Meanings of a Chinese Morpheme through Semantic Pattern Matching in Augmented Minimum Spanning Trees
The aim of this paper is to explore the feasibility of solving the dependency parsing problem using sequence labeling tools. We introduce an algorithm to transform a dependency tree into a tag sequence suitable for a sequence labeling algorithm and evaluate several parameter settings on the standard treebank data. We focus mainly on Czech, as a high-inflective freeword-order language, which is ...
متن کاملA Metaheuristic Algorithm for the Minimum Routing Cost Spanning Tree Problem
The routing cost of a spanning tree in a weighted and connected graph is defined as the total length of paths between all pairs of vertices. The objective of the minimum routing cost spanning tree problem is to find a spanning tree such that its routing cost is minimum. This is an NP-Hard problem that we present a GRASP with path-relinking metaheuristic algorithm for it. GRASP is a multi-start ...
متن کاملGuided Local Search for Query Reformulation Using Weight Propagation
A new technique for query reformulation that assesses the relevance of retrieved documents using weight propagation is proposed. The technique uses a Guided Local Search (GLS) in conjunction with the latent semantic indexing model (to semantically cluster documents together) and Lexical Matching (LM). The GLS algorithm is used to construct a minimum spanning tree that is later employed in the r...
متن کاملA Study on Semantic Word-Formation Rules of Chinese Nouns from the Perspective of Generative Lexicon Theory - - - A Case Study of Undirected Disyllable Compounds
This paper mainly applies the qualia structure theory to the study of compound nouns whose lexical meanings cannot be inferred from their morpheme meanings by taking some undirected disyllabic compound nouns as example from the Chinese Semantic Word-formation Database. The paper concludes some specific ways by which morpheme meanings can be integrated with lexical meanings. It is hoped that the...
متن کاملClustering Web Search Results with Maximum Spanning Trees
We present a novel method for clustering Web search results based on Word Sense Induction. First, we acquire the meanings of a query by means of a graph-based clustering algorithm that calculates the maximum spanning tree of the co-occurrence graph of the query. Then we cluster the search results based on their semantic similarity to the induced word senses. We show that our approach improves c...
متن کامل